Automated labeling in document images

نویسندگان

  • Jongwoo Kim
  • Daniel X. Le
  • George R. Thoma
چکیده

The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINE database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zones of interest (title, author, affiliation, and abstract) automatically. The AL algorithm is based on 120 rules that are derived from an analysis of journal page layouts and features extracted from OCR output. Experiments carried out on more than 11,000 articles in over 1,000 biomedical journals show the accuracy of this rule-based algorithm to exceed 96%. Keyword: OCR, automated data entry, automated zoning, automated labeling, rule-based algorithm, MARS, NLM

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Document Labeling Using Integrated Image and Neural Processing

As part of our effort to develop an automated data entry system to identify and convert bibliographic information from paper-based documents to electronic format for inclusion in the MEDLINE database used worldwide by biomedical researchers and clinicians, we have implemented a new technique for automatically labeling zones from scanned images with meaningful labels such as title, author, affi...

متن کامل

Face Detection with methods based on color by using Artificial Neural Network

The face Detection methodsis used in order to provide security. The mentioned methods problems are that it cannot be categorized because of the great differences and varieties in the face of individuals. In this paper, face Detection methods has been presented for overcoming upon these problems based on skin color datum. The researcher gathered a face database of 30 individuals consisting of ov...

متن کامل

A Semi-Automated Algorithm for Segmentation of the Left Atrial Appendage Landing Zone: Application in Left Atrial Appendage Occlusion Procedures

Background: Mechanical occlusion of the Left atrial appendage (LAA) using a purpose-built device has emerged as an effective prophylactic treatment in patients with atrial fibrillation at risk of stroke and a contraindication for anticoagulation. A crucial step in procedural planning is the choice of the device size. This is currently based on the manual analysis of the “Device Landing Zone” fr...

متن کامل

Automated classification of pulmonary nodules through a retrospective analysis of conventional CT and two-phase PET images in patients undergoing biopsy

Objective(s): Positron emission tomography/computed tomography (PET/CT) examination is commonly used for the evaluation of pulmonary nodules since it provides both anatomical and functional information. However, given the dependence of this evaluation on physician’s subjective judgment, the results could be variable. The purpose of this study was to develop an automated scheme for the classific...

متن کامل

Novel Automated Method for Minirhizotron Image Analysis: Root Detection using Curvelet Transform

In this article a new method is introduced for distinguishing roots and background based on their digital curvelet transform in minirhizotron images. In the proposed method, the nonlinear mapping is applied on sub-band curvelet components followed by boundary detection using energy optimization concept. The curvelet transform has the excellent capability in detecting roots with different orient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001